Dataset statistics
| Number of variables | 27 |
|---|---|
| Number of observations | 517737 |
| Missing cells | 8093719 |
| Missing cells (%) | 57.9% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 106.7 MiB |
| Average record size in memory | 216.0 B |
Variable types
| CAT | 21 |
|---|---|
| NUM | 4 |
| UNSUPPORTED | 2 |
BID has a high cardinality: 133980 distinct values | High cardinality |
StartDt has a high cardinality: 385 distinct values | High cardinality |
EndDt has a high cardinality: 366 distinct values | High cardinality |
PID has a high cardinality: 5012 distinct values | High cardinality |
AttendingPhysician has a high cardinality: 74109 distinct values | High cardinality |
OperatingPhysician has a high cardinality: 28532 distinct values | High cardinality |
OtherPhysician has a high cardinality: 44388 distinct values | High cardinality |
DiagnosisCode_1 has a high cardinality: 10354 distinct values | High cardinality |
DiagnosisCode_2 has a high cardinality: 5056 distinct values | High cardinality |
DiagnosisCode_3 has a high cardinality: 4448 distinct values | High cardinality |
DiagnosisCode_4 has a high cardinality: 3925 distinct values | High cardinality |
DiagnosisCode_5 has a high cardinality: 3412 distinct values | High cardinality |
DiagnosisCode_6 has a high cardinality: 2968 distinct values | High cardinality |
DiagnosisCode_7 has a high cardinality: 2635 distinct values | High cardinality |
DiagnosisCode_8 has a high cardinality: 2260 distinct values | High cardinality |
DiagnosisCode_9 has a high cardinality: 1894 distinct values | High cardinality |
DiagnosisCode_10 has a high cardinality: 495 distinct values | High cardinality |
AdmitDiagnosisCode has a high cardinality: 3715 distinct values | High cardinality |
ProcedureCode_4 is highly correlated with AmtReimbursed and 3 other fields | High correlation |
AmtReimbursed is highly correlated with ProcedureCode_4 | High correlation |
ProcedureCode_1 is highly correlated with ProcedureCode_4 | High correlation |
ProcedureCode_2 is highly correlated with ProcedureCode_4 | High correlation |
ProcedureCode_3 is highly correlated with ProcedureCode_4 | High correlation |
OperatingPhysician has 427120 (82.5%) missing values | Missing |
OtherPhysician has 322691 (62.3%) missing values | Missing |
DiagnosisCode_1 has 10453 (2.0%) missing values | Missing |
DiagnosisCode_2 has 195380 (37.7%) missing values | Missing |
DiagnosisCode_3 has 314480 (60.7%) missing values | Missing |
DiagnosisCode_4 has 392141 (75.7%) missing values | Missing |
DiagnosisCode_5 has 443393 (85.6%) missing values | Missing |
DiagnosisCode_6 has 468981 (90.6%) missing values | Missing |
DiagnosisCode_7 has 484776 (93.6%) missing values | Missing |
DiagnosisCode_8 has 494825 (95.6%) missing values | Missing |
DiagnosisCode_9 has 502899 (97.1%) missing values | Missing |
DiagnosisCode_10 has 516654 (99.8%) missing values | Missing |
ProcedureCode_1 has 517575 (> 99.9%) missing values | Missing |
ProcedureCode_2 has 517701 (> 99.9%) missing values | Missing |
ProcedureCode_3 has 517733 (> 99.9%) missing values | Missing |
ProcedureCode_4 has 517735 (> 99.9%) missing values | Missing |
ProcedureCode_5 has 517737 (100.0%) missing values | Missing |
ProcedureCode_6 has 517737 (100.0%) missing values | Missing |
AdmitDiagnosisCode has 412312 (79.6%) missing values | Missing |
AmtReimbursed is highly skewed (γ1 = 34.61136686) | Skewed |
ProcedureCode_3 is uniformly distributed | Uniform |
ProcedureCode_4 is uniformly distributed | Uniform |
CID has unique values | Unique |
ProcedureCode_5 is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
ProcedureCode_6 is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
AmtReimbursed has 19568 (3.8%) zeros | Zeros |
DeductibleAmt has 496701 (95.9%) zeros | Zeros |
Reproduction
| Analysis started | 2020-10-13 16:29:29.546891 |
|---|---|
| Analysis finished | 2020-10-13 16:30:09.800344 |
| Duration | 40.25 seconds |
| Software version | pandas-profiling v2.9.0 |
| Download configuration | config.yaml |
| Distinct | 133980 |
|---|---|
| Distinct (%) | 25.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 4.0 MiB |
| BENE118316 | 29 |
|---|---|
| BENE42721 | 29 |
| BENE59303 | 27 |
| BENE63544 | 27 |
| BENE63504 | 27 |
| Other values (133975) |
| Value | Count | Frequency (%) | |
| BENE118316 | 29 | < 0.1% | |
| BENE42721 | 29 | < 0.1% | |
| BENE59303 | 27 | < 0.1% | |
| BENE63544 | 27 | < 0.1% | |
| BENE63504 | 27 | < 0.1% | |
| BENE143400 | 27 | < 0.1% | |
| BENE36330 | 26 | < 0.1% | |
| BENE44241 | 26 | < 0.1% | |
| BENE87248 | 25 | < 0.1% | |
| BENE158374 | 25 | < 0.1% | |
| Other values (133970) | 517469 | 99.9% |
Frequencies of value counts
Unique
| Unique | 33631 ? |
|---|---|
| Unique (%) | 6.5% |
Histogram of lengths of the category
Length
| Max length | 10 |
|---|---|
| Median length | 9 |
| Mean length | 9.400927498 |
| Min length | 9 |
| Distinct | 517737 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 4.0 MiB |
| CLM622424 | 1 |
|---|---|
| CLM674546 | 1 |
| CLM291347 | 1 |
| CLM579979 | 1 |
| CLM287987 | 1 |
| Other values (517732) |
| Value | Count | Frequency (%) | |
| CLM622424 | 1 | < 0.1% | |
| CLM674546 | 1 | < 0.1% | |
| CLM291347 | 1 | < 0.1% | |
| CLM579979 | 1 | < 0.1% | |
| CLM287987 | 1 | < 0.1% | |
| CLM376031 | 1 | < 0.1% | |
| CLM361422 | 1 | < 0.1% | |
| CLM327383 | 1 | < 0.1% | |
| CLM495354 | 1 | < 0.1% | |
| CLM472387 | 1 | < 0.1% | |
| Other values (517727) | 517727 | > 99.9% |
Frequencies of value counts
Unique
| Unique | 517737 ? |
|---|---|
| Unique (%) | 100.0% |
Histogram of lengths of the category
Length
| Max length | 9 |
|---|---|
| Median length | 9 |
| Mean length | 8.99996137 |
| Min length | 8 |
| Distinct | 385 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 4.0 MiB |
| 2009-03-03 | 1574 |
|---|---|
| 2009-03-21 | 1567 |
| 2009-01-31 | 1566 |
| 2009-04-25 | 1550 |
| 2009-02-16 | 1549 |
| Other values (380) |
| Value | Count | Frequency (%) | |
| 2009-03-03 | 1574 | 0.3% | |
| 2009-03-21 | 1567 | 0.3% | |
| 2009-01-31 | 1566 | 0.3% | |
| 2009-04-25 | 1550 | 0.3% | |
| 2009-02-16 | 1549 | 0.3% | |
| 2009-05-07 | 1548 | 0.3% | |
| 2009-03-08 | 1545 | 0.3% | |
| 2009-06-10 | 1544 | 0.3% | |
| 2009-06-07 | 1539 | 0.3% | |
| 2009-05-01 | 1537 | 0.3% | |
| Other values (375) | 502218 | 97.0% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 10 |
|---|---|
| Median length | 10 |
| Mean length | 10 |
| Min length | 10 |
| Distinct | 366 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 4.0 MiB |
| 2009-03-03 | 1563 |
|---|---|
| 2009-03-21 | 1561 |
| 2009-04-23 | 1554 |
| 2009-05-01 | 1548 |
| 2009-06-20 | 1542 |
| Other values (361) |
| Value | Count | Frequency (%) | |
| 2009-03-03 | 1563 | 0.3% | |
| 2009-03-21 | 1561 | 0.3% | |
| 2009-04-23 | 1554 | 0.3% | |
| 2009-05-01 | 1548 | 0.3% | |
| 2009-06-20 | 1542 | 0.3% | |
| 2009-03-08 | 1541 | 0.3% | |
| 2009-05-13 | 1540 | 0.3% | |
| 2009-02-16 | 1539 | 0.3% | |
| 2009-01-31 | 1537 | 0.3% | |
| 2009-03-30 | 1536 | 0.3% | |
| Other values (356) | 502276 | 97.0% |
Frequencies of value counts
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | < 0.1% |
Histogram of lengths of the category
Length
| Max length | 10 |
|---|---|
| Median length | 10 |
| Mean length | 10 |
| Min length | 10 |
| Distinct | 5012 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 4.0 MiB |
| PRV51459 | 8240 |
|---|---|
| PRV53797 | 4739 |
| PRV51574 | 4444 |
| PRV53918 | 3588 |
| PRV54895 | 3433 |
| Other values (5007) |
| Value | Count | Frequency (%) | |
| PRV51459 | 8240 | 1.6% | |
| PRV53797 | 4739 | 0.9% | |
| PRV51574 | 4444 | 0.9% | |
| PRV53918 | 3588 | 0.7% | |
| PRV54895 | 3433 | 0.7% | |
| PRV55215 | 3250 | 0.6% | |
| PRV56011 | 2833 | 0.5% | |
| PRV52064 | 2806 | 0.5% | |
| PRV55004 | 2396 | 0.5% | |
| PRV57306 | 2315 | 0.4% | |
| Other values (5002) | 479693 | 92.7% |
Frequencies of value counts
Unique
| Unique | 200 ? |
|---|---|
| Unique (%) | < 0.1% |
Histogram of lengths of the category
Length
| Max length | 8 |
|---|---|
| Median length | 8 |
| Mean length | 8 |
| Min length | 8 |
| Distinct | 342 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 286.3347993 |
|---|---|
| Minimum | 0 |
| Maximum | 102500 |
| Zeros | 19568 |
| Zeros (%) | 3.8% |
| Memory size | 4.0 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 10 |
| Q1 | 40 |
| median | 80 |
| Q3 | 200 |
| 95-th percentile | 1500 |
| Maximum | 102500 |
| Range | 102500 |
| Interquartile range (IQR) | 160 |
Descriptive statistics
| Standard deviation | 694.0343433 |
|---|---|
| Coefficient of variation (CV) | 2.423856076 |
| Kurtosis | 4172.177736 |
| Mean | 286.3347993 |
| Median Absolute Deviation (MAD) | 50 |
| Skewness | 34.61136686 |
| Sum | 148246120 |
| Variance | 481683.6696 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 100 | 52943 | 10.2% | |
| 10 | 42461 | 8.2% | |
| 200 | 41594 | 8.0% | |
| 60 | 40762 | 7.9% | |
| 30 | 33919 | 6.6% | |
| 40 | 33616 | 6.5% | |
| 50 | 31293 | 6.0% | |
| 20 | 27960 | 5.4% | |
| 80 | 25095 | 4.8% | |
| 70 | 24412 | 4.7% | |
| Other values (332) | 163682 | 31.6% |
| Value | Count | Frequency (%) | |
| 0 | 19568 | 3.8% | |
| 10 | 42461 | 8.2% | |
| 20 | 27960 | 5.4% | |
| 30 | 33919 | 6.6% | |
| 40 | 33616 | 6.5% |
| Value | Count | Frequency (%) | |
| 102500 | 1 | < 0.1% | |
| 101250 | 1 | < 0.1% | |
| 95580 | 1 | < 0.1% | |
| 85680 | 1 | < 0.1% | |
| 84660 | 1 | < 0.1% |
| Distinct | 74109 |
|---|---|
| Distinct (%) | 14.4% |
| Missing | 1396 |
| Missing (%) | 0.3% |
| Memory size | 4.0 MiB |
| PHY330576 | 2534 |
|---|---|
| PHY350277 | 1628 |
| PHY412132 | 1321 |
| PHY423534 | 1223 |
| PHY314027 | 1200 |
| Other values (74104) |
| Value | Count | Frequency (%) | |
| PHY330576 | 2534 | 0.5% | |
| PHY350277 | 1628 | 0.3% | |
| PHY412132 | 1321 | 0.3% | |
| PHY423534 | 1223 | 0.2% | |
| PHY314027 | 1200 | 0.2% | |
| PHY327046 | 1181 | 0.2% | |
| PHY338032 | 1158 | 0.2% | |
| PHY337425 | 1156 | 0.2% | |
| PHY357120 | 1156 | 0.2% | |
| PHY341578 | 1133 | 0.2% | |
| Other values (74099) | 502651 | 97.1% | |
| (Missing) | 1396 | 0.3% |
Frequencies of value counts
Unique
| Unique | 32687 ? |
|---|---|
| Unique (%) | 6.3% |
Histogram of lengths of the category
Length
| Max length | 9 |
|---|---|
| Median length | 9 |
| Mean length | 8.983821902 |
| Min length | 3 |
| Distinct | 28532 |
|---|---|
| Distinct (%) | 31.5% |
| Missing | 427120 |
| Missing (%) | 82.5% |
| Memory size | 4.0 MiB |
| PHY330576 | 424 |
|---|---|
| PHY424897 | 293 |
| PHY314027 | 256 |
| PHY423534 | 250 |
| PHY357120 | 249 |
| Other values (28527) |
| Value | Count | Frequency (%) | |
| PHY330576 | 424 | 0.1% | |
| PHY424897 | 293 | 0.1% | |
| PHY314027 | 256 | < 0.1% | |
| PHY423534 | 250 | < 0.1% | |
| PHY357120 | 249 | < 0.1% | |
| PHY412132 | 245 | < 0.1% | |
| PHY327046 | 236 | < 0.1% | |
| PHY333735 | 232 | < 0.1% | |
| PHY381249 | 231 | < 0.1% | |
| PHY337425 | 226 | < 0.1% | |
| Other values (28522) | 87975 | 17.0% | |
| (Missing) | 427120 | 82.5% |
Frequencies of value counts
Unique
| Unique | 17159 ? |
|---|---|
| Unique (%) | 18.9% |
Histogram of lengths of the category
Length
| Max length | 9 |
|---|---|
| Median length | 3 |
| Mean length | 4.050150945 |
| Min length | 3 |
| Distinct | 44388 |
|---|---|
| Distinct (%) | 22.8% |
| Missing | 322691 |
| Missing (%) | 62.3% |
| Memory size | 4.0 MiB |
| PHY412132 | 1247 |
|---|---|
| PHY341578 | 1098 |
| PHY338032 | 1070 |
| PHY337425 | 1041 |
| PHY347064 | 806 |
| Other values (44383) |
| Value | Count | Frequency (%) | |
| PHY412132 | 1247 | 0.2% | |
| PHY341578 | 1098 | 0.2% | |
| PHY338032 | 1070 | 0.2% | |
| PHY337425 | 1041 | 0.2% | |
| PHY347064 | 806 | 0.2% | |
| PHY322092 | 771 | 0.1% | |
| PHY409965 | 744 | 0.1% | |
| PHY313818 | 730 | 0.1% | |
| PHY350277 | 682 | 0.1% | |
| PHY415321 | 678 | 0.1% | |
| Other values (44378) | 186179 | 36.0% | |
| (Missing) | 322691 | 62.3% |
Frequencies of value counts
Unique
| Unique | 24312 ? |
|---|---|
| Unique (%) | 12.5% |
Histogram of lengths of the category
Length
| Max length | 9 |
|---|---|
| Median length | 3 |
| Mean length | 5.260367716 |
| Min length | 3 |
| Distinct | 10354 |
|---|---|
| Distinct (%) | 2.0% |
| Missing | 10453 |
| Missing (%) | 2.0% |
| Memory size | 4.0 MiB |
| 4019 | 13803 |
|---|---|
| 4011 | 12512 |
| 2724 | 3603 |
| 2720 | 3209 |
| 2722 | 3028 |
| Other values (10349) |
| Value | Count | Frequency (%) | |
| 4019 | 13803 | 2.7% | |
| 4011 | 12512 | 2.4% | |
| 2724 | 3603 | 0.7% | |
| 2720 | 3209 | 0.6% | |
| 2722 | 3028 | 0.6% | |
| 2721 | 2998 | 0.6% | |
| 2723 | 2995 | 0.6% | |
| 78651 | 2251 | 0.4% | |
| 78659 | 2181 | 0.4% | |
| 78650 | 2179 | 0.4% | |
| Other values (10344) | 458525 | 88.6% | |
| (Missing) | 10453 | 2.0% |
Frequencies of value counts
Unique
| Unique | 1212 ? |
|---|---|
| Unique (%) | 0.2% |
Histogram of lengths of the category
Length
| Max length | 5 |
|---|---|
| Median length | 4 |
| Mean length | 4.455167392 |
| Min length | 3 |
| Distinct | 5056 |
|---|---|
| Distinct (%) | 1.6% |
| Missing | 195380 |
| Missing (%) | 37.7% |
| Memory size | 4.0 MiB |
| 4019 | 19894 |
|---|---|
| 25000 | 10674 |
| 2724 | 10147 |
| V5869 | 9573 |
| V5861 | 9550 |
| Other values (5051) |
| Value | Count | Frequency (%) | |
| 4019 | 19894 | 3.8% | |
| 25000 | 10674 | 2.1% | |
| 2724 | 10147 | 2.0% | |
| V5869 | 9573 | 1.8% | |
| V5861 | 9550 | 1.8% | |
| 2449 | 5090 | 1.0% | |
| 42731 | 5052 | 1.0% | |
| 2720 | 4750 | 0.9% | |
| 4011 | 4579 | 0.9% | |
| 28521 | 4063 | 0.8% | |
| Other values (5046) | 238985 | 46.2% | |
| (Missing) | 195380 | 37.7% |
Frequencies of value counts
Unique
| Unique | 1208 ? |
|---|---|
| Unique (%) | 0.4% |
Histogram of lengths of the category
Length
| Max length | 5 |
|---|---|
| Median length | 4 |
| Mean length | 3.917852114 |
| Min length | 3 |
| Distinct | 4448 |
|---|---|
| Distinct (%) | 2.2% |
| Missing | 314480 |
| Missing (%) | 60.7% |
| Memory size | 4.0 MiB |
| 4019 | 12126 |
|---|---|
| 25000 | 6838 |
| 2724 | 6271 |
| V5869 | 6002 |
| V5861 | 4028 |
| Other values (4443) |
| Value | Count | Frequency (%) | |
| 4019 | 12126 | 2.3% | |
| 25000 | 6838 | 1.3% | |
| 2724 | 6271 | 1.2% | |
| V5869 | 6002 | 1.2% | |
| V5861 | 4028 | 0.8% | |
| 2449 | 3238 | 0.6% | |
| 2720 | 3097 | 0.6% | |
| 42731 | 2999 | 0.6% | |
| 4011 | 2844 | 0.5% | |
| 28521 | 2654 | 0.5% | |
| Other values (4438) | 153160 | 29.6% | |
| (Missing) | 314480 | 60.7% |
Frequencies of value counts
Unique
| Unique | 1115 ? |
|---|---|
| Unique (%) | 0.5% |
Histogram of lengths of the category
Length
| Max length | 5 |
|---|---|
| Median length | 3 |
| Mean length | 3.577671675 |
| Min length | 3 |
| Distinct | 3925 |
|---|---|
| Distinct (%) | 3.1% |
| Missing | 392141 |
| Missing (%) | 75.7% |
| Memory size | 4.0 MiB |
| 4019 | 7088 |
|---|---|
| 25000 | 4235 |
| 2724 | 3736 |
| V5869 | 3300 |
| 2449 | 1942 |
| Other values (3920) |
| Value | Count | Frequency (%) | |
| 4019 | 7088 | 1.4% | |
| 25000 | 4235 | 0.8% | |
| 2724 | 3736 | 0.7% | |
| V5869 | 3300 | 0.6% | |
| 2449 | 1942 | 0.4% | |
| 2720 | 1940 | 0.4% | |
| V5861 | 1750 | 0.3% | |
| 42731 | 1691 | 0.3% | |
| 4011 | 1644 | 0.3% | |
| 53081 | 1449 | 0.3% | |
| Other values (3915) | 96821 | 18.7% | |
| (Missing) | 392141 | 75.7% |
Frequencies of value counts
Unique
| Unique | 1134 ? |
|---|---|
| Unique (%) | 0.9% |
Histogram of lengths of the category
Length
| Max length | 5 |
|---|---|
| Median length | 3 |
| Mean length | 3.35537155 |
| Min length | 3 |
| Distinct | 3412 |
|---|---|
| Distinct (%) | 4.6% |
| Missing | 443393 |
| Missing (%) | 85.6% |
| Memory size | 4.0 MiB |
| 4019 | 4116 |
|---|---|
| 25000 | 2473 |
| 2724 | 1945 |
| V5869 | 1852 |
| 2449 | 1081 |
| Other values (3407) |
| Value | Count | Frequency (%) | |
| 4019 | 4116 | 0.8% | |
| 25000 | 2473 | 0.5% | |
| 2724 | 1945 | 0.4% | |
| V5869 | 1852 | 0.4% | |
| 2449 | 1081 | 0.2% | |
| 2720 | 1069 | 0.2% | |
| 53081 | 969 | 0.2% | |
| V5861 | 941 | 0.2% | |
| 42731 | 938 | 0.2% | |
| 4011 | 882 | 0.2% | |
| Other values (3402) | 58078 | 11.2% | |
| (Missing) | 443393 | 85.6% |
Frequencies of value counts
Unique
| Unique | 1049 ? |
|---|---|
| Unique (%) | 1.4% |
Histogram of lengths of the category
Length
| Max length | 5 |
|---|---|
| Median length | 3 |
| Mean length | 3.211261702 |
| Min length | 3 |
| Distinct | 2968 |
|---|---|
| Distinct (%) | 6.1% |
| Missing | 468981 |
| Missing (%) | 90.6% |
| Memory size | 4.0 MiB |
| 4019 | 2550 |
|---|---|
| 25000 | 1595 |
| 2724 | 1169 |
| V5869 | 1106 |
| 2720 | 695 |
| Other values (2963) |
| Value | Count | Frequency (%) | |
| 4019 | 2550 | 0.5% | |
| 25000 | 1595 | 0.3% | |
| 2724 | 1169 | 0.2% | |
| V5869 | 1106 | 0.2% | |
| 2720 | 695 | 0.1% | |
| 2449 | 685 | 0.1% | |
| 42731 | 649 | 0.1% | |
| 53081 | 622 | 0.1% | |
| 496 | 573 | 0.1% | |
| V5861 | 570 | 0.1% | |
| Other values (2958) | 38542 | 7.4% | |
| (Missing) | 468981 | 90.6% |
Frequencies of value counts
Unique
| Unique | 990 ? |
|---|---|
| Unique (%) | 2.0% |
Histogram of lengths of the category
Length
| Max length | 5 |
|---|---|
| Median length | 3 |
| Mean length | 3.138778955 |
| Min length | 3 |
| Distinct | 2635 |
|---|---|
| Distinct (%) | 8.0% |
| Missing | 484776 |
| Missing (%) | 93.6% |
| Memory size | 4.0 MiB |
| 4019 | 1612 |
|---|---|
| 25000 | 1003 |
| 2724 | 733 |
| V5869 | 717 |
| 2720 | 502 |
| Other values (2630) |
| Value | Count | Frequency (%) | |
| 4019 | 1612 | 0.3% | |
| 25000 | 1003 | 0.2% | |
| 2724 | 733 | 0.1% | |
| V5869 | 717 | 0.1% | |
| 2720 | 502 | 0.1% | |
| 2449 | 436 | 0.1% | |
| 53081 | 431 | 0.1% | |
| 42731 | 418 | 0.1% | |
| 496 | 391 | 0.1% | |
| 4280 | 377 | 0.1% | |
| Other values (2625) | 26341 | 5.1% | |
| (Missing) | 484776 | 93.6% |
Frequencies of value counts
Unique
| Unique | 966 ? |
|---|---|
| Unique (%) | 2.9% |
Histogram of lengths of the category
Length
| Max length | 5 |
|---|---|
| Median length | 3 |
| Mean length | 3.09373485 |
| Min length | 3 |
| Distinct | 2260 |
|---|---|
| Distinct (%) | 9.9% |
| Missing | 494825 |
| Missing (%) | 95.6% |
| Memory size | 4.0 MiB |
| 4019 | 1057 |
|---|---|
| 25000 | 702 |
| 2724 | 516 |
| V5869 | 471 |
| 2720 | 325 |
| Other values (2255) |
| Value | Count | Frequency (%) | |
| 4019 | 1057 | 0.2% | |
| 25000 | 702 | 0.1% | |
| 2724 | 516 | 0.1% | |
| V5869 | 471 | 0.1% | |
| 2720 | 325 | 0.1% | |
| 2449 | 313 | 0.1% | |
| 53081 | 297 | 0.1% | |
| 42731 | 280 | 0.1% | |
| 496 | 277 | 0.1% | |
| V5861 | 262 | 0.1% | |
| Other values (2250) | 18412 | 3.6% | |
| (Missing) | 494825 | 95.6% |
Frequencies of value counts
Unique
| Unique | 844 ? |
|---|---|
| Unique (%) | 3.7% |
Histogram of lengths of the category
Length
| Max length | 5 |
|---|---|
| Median length | 3 |
| Mean length | 3.065114141 |
| Min length | 3 |
| Distinct | 1894 |
|---|---|
| Distinct (%) | 12.8% |
| Missing | 502899 |
| Missing (%) | 97.1% |
| Memory size | 4.0 MiB |
| 4019 | 616 |
|---|---|
| 25000 | 468 |
| V5869 | 292 |
| 2724 | 289 |
| 2720 | 250 |
| Other values (1889) |
| Value | Count | Frequency (%) | |
| 4019 | 616 | 0.1% | |
| 25000 | 468 | 0.1% | |
| V5869 | 292 | 0.1% | |
| 2724 | 289 | 0.1% | |
| 2720 | 250 | < 0.1% | |
| 53081 | 208 | < 0.1% | |
| 496 | 185 | < 0.1% | |
| 2449 | 184 | < 0.1% | |
| V5861 | 183 | < 0.1% | |
| 4280 | 171 | < 0.1% | |
| Other values (1884) | 11992 | 2.3% | |
| (Missing) | 502899 | 97.1% |
Frequencies of value counts
Unique
| Unique | 759 ? |
|---|---|
| Unique (%) | 5.1% |
Histogram of lengths of the category
Length
| Max length | 5 |
|---|---|
| Median length | 3 |
| Mean length | 3.042253113 |
| Min length | 3 |
| Distinct | 495 |
|---|---|
| Distinct (%) | 45.7% |
| Missing | 516654 |
| Missing (%) | 99.8% |
| Memory size | 4.0 MiB |
| 4019 | 41 |
|---|---|
| 25000 | 35 |
| 2720 | 17 |
| V5869 | 16 |
| 42731 | 15 |
| Other values (490) |
| Value | Count | Frequency (%) | |
| 4019 | 41 | < 0.1% | |
| 25000 | 35 | < 0.1% | |
| 2720 | 17 | < 0.1% | |
| V5869 | 16 | < 0.1% | |
| 42731 | 15 | < 0.1% | |
| 53081 | 15 | < 0.1% | |
| 2724 | 14 | < 0.1% | |
| 2449 | 13 | < 0.1% | |
| 3051 | 13 | < 0.1% | |
| E8490 | 13 | < 0.1% | |
| Other values (485) | 891 | 0.2% | |
| (Missing) | 516654 | 99.8% |
Frequencies of value counts
Unique
| Unique | 323 ? |
|---|---|
| Unique (%) | 29.8% |
Histogram of lengths of the category
Length
| Max length | 5 |
|---|---|
| Median length | 3 |
| Mean length | 3.003159906 |
| Min length | 3 |
| Distinct | 80 |
|---|---|
| Distinct (%) | 49.4% |
| Missing | 517575 |
| Missing (%) | > 99.9% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6116.611111 |
|---|---|
| Minimum | 51 |
| Maximum | 9999 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 4.0 MiB |
Quantile statistics
| Minimum | 51 |
|---|---|
| 5-th percentile | 70.25 |
| Q1 | 3893 |
| median | 5244.5 |
| Q3 | 9421.5 |
| 95-th percentile | 9952 |
| Maximum | 9999 |
| Range | 9948 |
| Interquartile range (IQR) | 5528.5 |
Descriptive statistics
| Standard deviation | 3217.719258 |
|---|---|
| Coefficient of variation (CV) | 0.5260624223 |
| Kurtosis | -1.101239475 |
| Mean | 6116.611111 |
| Median Absolute Deviation (MAD) | 2908.5 |
| Skewness | -0.2939338603 |
| Sum | 990891 |
| Variance | 10353717.22 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 9904 | 15 | < 0.1% | |
| 3722 | 8 | < 0.1% | |
| 4516 | 8 | < 0.1% | |
| 66 | 7 | < 0.1% | |
| 5123 | 7 | < 0.1% | |
| 9952 | 5 | < 0.1% | |
| 9672 | 5 | < 0.1% | |
| 3893 | 5 | < 0.1% | |
| 8622 | 4 | < 0.1% | |
| 3995 | 4 | < 0.1% | |
| Other values (70) | 94 | < 0.1% | |
| (Missing) | 517575 | > 99.9% |
| Value | Count | Frequency (%) | |
| 51 | 2 | < 0.1% | |
| 66 | 7 | < 0.1% | |
| 151 | 1 | < 0.1% | |
| 239 | 1 | < 0.1% | |
| 311 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 9999 | 1 | < 0.1% | |
| 9961 | 1 | < 0.1% | |
| 9955 | 4 | < 0.1% | |
| 9952 | 5 | < 0.1% | |
| 9929 | 1 | < 0.1% |
| Distinct | 22 |
|---|---|
| Distinct (%) | 61.1% |
| Missing | 517701 |
| Missing (%) | > 99.9% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4503.277778 |
|---|---|
| Minimum | 412 |
| Maximum | 9982 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 4.0 MiB |
Quantile statistics
| Minimum | 412 |
|---|---|
| 5-th percentile | 496 |
| Q1 | 2724 |
| median | 4019 |
| Q3 | 5849 |
| 95-th percentile | 8389.25 |
| Maximum | 9982 |
| Range | 9570 |
| Interquartile range (IQR) | 3125 |
Descriptive statistics
| Standard deviation | 2504.015 |
|---|---|
| Coefficient of variation (CV) | 0.556042759 |
| Kurtosis | -0.3033716551 |
| Mean | 4503.277778 |
| Median Absolute Deviation (MAD) | 1295 |
| Skewness | 0.4816786953 |
| Sum | 162118 |
| Variance | 6270091.121 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=22)
| Value | Count | Frequency (%) | |
| 4019 | 6 | < 0.1% | |
| 2724 | 6 | < 0.1% | |
| 1741 | 2 | < 0.1% | |
| 496 | 2 | < 0.1% | |
| 5849 | 2 | < 0.1% | |
| 7820 | 2 | < 0.1% | |
| 3811 | 1 | < 0.1% | |
| 2731 | 1 | < 0.1% | |
| 4439 | 1 | < 0.1% | |
| 4571 | 1 | < 0.1% | |
| Other values (12) | 12 | < 0.1% | |
| (Missing) | 517701 | > 99.9% |
| Value | Count | Frequency (%) | |
| 412 | 1 | < 0.1% | |
| 496 | 2 | < 0.1% | |
| 1741 | 2 | < 0.1% | |
| 2724 | 6 | < 0.1% | |
| 2731 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 9982 | 1 | < 0.1% | |
| 9971 | 1 | < 0.1% | |
| 7862 | 1 | < 0.1% | |
| 7840 | 1 | < 0.1% | |
| 7820 | 2 | < 0.1% |
| Distinct | 4 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 517733 |
| Missing (%) | > 99.9% |
| Memory size | 4.0 MiB |
| 412 | |
|---|---|
| 2724 | |
| 4401 | |
| 4299 |
| Value | Count | Frequency (%) | |
| 412 | 1 | < 0.1% | |
| 2724 | 1 | < 0.1% | |
| 4401 | 1 | < 0.1% | |
| 4299 | 1 | < 0.1% | |
| (Missing) | 517733 | > 99.9% |
Frequencies of value counts
Unique
| Unique | 4 ? |
|---|---|
| Unique (%) | 100.0% |
Histogram of lengths of the category
Length
| Max length | 6 |
|---|---|
| Median length | 3 |
| Mean length | 3.000021246 |
| Min length | 3 |
| Distinct | 2 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 517735 |
| Missing (%) | > 99.9% |
| Memory size | 4.0 MiB |
| 7840 | |
|---|---|
| 311 |
| Value | Count | Frequency (%) | |
| 7840 | 1 | < 0.1% | |
| 311 | 1 | < 0.1% | |
| (Missing) | 517735 | > 99.9% |
Frequencies of value counts
Unique
| Unique | 2 ? |
|---|---|
| Unique (%) | 100.0% |
Histogram of lengths of the category
Length
| Max length | 6 |
|---|---|
| Median length | 3 |
| Mean length | 3.000009657 |
| Min length | 3 |
| Distinct | 16 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.779233472 |
|---|---|
| Minimum | 0 |
| Maximum | 897 |
| Zeros | 496701 |
| Zeros (%) | 95.9% |
| Memory size | 4.0 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 897 |
| Range | 897 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 15.78583932 |
|---|---|
| Coefficient of variation (CV) | 5.67992559 |
| Kurtosis | 180.6903587 |
| Mean | 2.779233472 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 8.735340013 |
| Sum | 1438912 |
| Variance | 249.1927229 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=16)
| Value | Count | Frequency (%) | |
| 0 | 496701 | 95.9% | |
| 100 | 4582 | 0.9% | |
| 70 | 2420 | 0.5% | |
| 60 | 2065 | 0.4% | |
| 40 | 2045 | 0.4% | |
| 80 | 2024 | 0.4% | |
| 50 | 1969 | 0.4% | |
| 20 | 1406 | 0.3% | |
| 30 | 1336 | 0.3% | |
| 90 | 1245 | 0.2% | |
| Other values (6) | 1944 | 0.4% |
| Value | Count | Frequency (%) | |
| 0 | 496701 | 95.9% | |
| 10 | 1203 | 0.2% | |
| 20 | 1406 | 0.3% | |
| 30 | 1336 | 0.3% | |
| 40 | 2045 | 0.4% |
| Value | Count | Frequency (%) | |
| 897 | 2 | < 0.1% | |
| 886 | 1 | < 0.1% | |
| 876 | 2 | < 0.1% | |
| 865 | 2 | < 0.1% | |
| 200 | 734 | 0.1% |
| Distinct | 3715 |
|---|---|
| Distinct (%) | 3.5% |
| Missing | 412312 |
| Missing (%) | 79.6% |
| Memory size | 4.0 MiB |
| V7612 | 4074 |
|---|---|
| 42731 | 3001 |
| 4019 | 2627 |
| 25000 | 2346 |
| V5883 | 1871 |
| Other values (3710) |
| Value | Count | Frequency (%) | |
| V7612 | 4074 | 0.8% | |
| 42731 | 3001 | 0.6% | |
| 4019 | 2627 | 0.5% | |
| 25000 | 2346 | 0.5% | |
| V5883 | 1871 | 0.4% | |
| 7295 | 1616 | 0.3% | |
| 78900 | 1550 | 0.3% | |
| V5861 | 1536 | 0.3% | |
| 2724 | 1506 | 0.3% | |
| 7242 | 1432 | 0.3% | |
| Other values (3705) | 83866 | 16.2% | |
| (Missing) | 412312 | 79.6% |
Frequencies of value counts
Unique
| Unique | 1106 ? |
|---|---|
| Unique (%) | 1.0% |
Histogram of lengths of the category
Length
| Max length | 5 |
|---|---|
| Median length | 3 |
| Mean length | 3.311370445 |
| Min length | 3 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| BID | CID | StartDt | EndDt | PID | AmtReimbursed | AttendingPhysician | OperatingPhysician | OtherPhysician | DiagnosisCode_1 | DiagnosisCode_2 | DiagnosisCode_3 | DiagnosisCode_4 | DiagnosisCode_5 | DiagnosisCode_6 | DiagnosisCode_7 | DiagnosisCode_8 | DiagnosisCode_9 | DiagnosisCode_10 | ProcedureCode_1 | ProcedureCode_2 | ProcedureCode_3 | ProcedureCode_4 | ProcedureCode_5 | ProcedureCode_6 | DeductibleAmt | AdmitDiagnosisCode | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | BENE11002 | CLM624349 | 2009-10-11 | 2009-10-11 | PRV56011 | 30 | PHY326117 | NaN | NaN | 78943 | V5866 | V1272 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | 56409 |
| 1 | BENE11003 | CLM189947 | 2009-02-12 | 2009-02-12 | PRV57610 | 80 | PHY362868 | NaN | NaN | 6115 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | 79380 |
| 2 | BENE11003 | CLM438021 | 2009-06-27 | 2009-06-27 | PRV57595 | 10 | PHY328821 | NaN | NaN | 2723 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN |
| 3 | BENE11004 | CLM121801 | 2009-01-06 | 2009-01-06 | PRV56011 | 40 | PHY334319 | NaN | NaN | 71988 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN |
| 4 | BENE11004 | CLM150998 | 2009-01-22 | 2009-01-22 | PRV56011 | 200 | PHY403831 | NaN | NaN | 82382 | 30000 | 72887 | 4280 | 7197 | V4577 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | 71947 |
| 5 | BENE11004 | CLM173224 | 2009-02-03 | 2009-02-03 | PRV56011 | 20 | PHY339887 | NaN | NaN | 20381 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN |
| 6 | BENE11004 | CLM224741 | 2009-03-03 | 2009-03-03 | PRV56011 | 40 | PHY345721 | NaN | NaN | V6546 | 4280 | 2449 | V854 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN |
| 7 | BENE11004 | CLM252512 | 2009-03-18 | 2009-03-18 | PRV56011 | 200 | PHY346833 | NaN | PHY346833 | 72290 | 7245 | 71945 | 71695 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN |
| 8 | BENE11004 | CLM322683 | 2009-04-25 | 2009-05-15 | PRV56011 | 60 | PHY372925 | NaN | PHY311407 | 71856 | 7265 | V1254 | 7295 | 72751 | 4019 | 9597 | 8449 | 71596 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN |
| 9 | BENE11004 | CLM339500 | 2009-05-04 | 2009-05-16 | PRV56011 | 500 | PHY412904 | NaN | PHY396473 | 7237 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN |
Last rows
| BID | CID | StartDt | EndDt | PID | AmtReimbursed | AttendingPhysician | OperatingPhysician | OtherPhysician | DiagnosisCode_1 | DiagnosisCode_2 | DiagnosisCode_3 | DiagnosisCode_4 | DiagnosisCode_5 | DiagnosisCode_6 | DiagnosisCode_7 | DiagnosisCode_8 | DiagnosisCode_9 | DiagnosisCode_10 | ProcedureCode_1 | ProcedureCode_2 | ProcedureCode_3 | ProcedureCode_4 | ProcedureCode_5 | ProcedureCode_6 | DeductibleAmt | AdmitDiagnosisCode | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 517727 | BENE159198 | CLM255268 | 2009-03-19 | 2009-03-19 | PRV53672 | 70 | PHY317739 | PHY317739 | PHY423886 | 5929 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN |
| 517728 | BENE159198 | CLM275604 | 2009-03-30 | 2009-04-19 | PRV53699 | 50 | PHY380182 | NaN | NaN | 71899 | 73392 | 71516 | 73342 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | 71946 |
| 517729 | BENE159198 | CLM310720 | 2009-04-18 | 2009-05-08 | PRV53670 | 0 | PHY329971 | NaN | NaN | 29561 | V5869 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | 29570 |
| 517730 | BENE159198 | CLM347778 | 2009-05-08 | 2009-05-08 | PRV53676 | 80 | PHY361063 | NaN | NaN | 30279 | V5869 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN |
| 517731 | BENE159198 | CLM400395 | 2009-06-06 | 2009-06-06 | PRV53699 | 100 | PHY380182 | NaN | PHY385752 | 29212 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN |
| 517732 | BENE159198 | CLM510792 | 2009-08-06 | 2009-08-06 | PRV53699 | 800 | PHY364188 | PHY364188 | PHY385752 | 2163 | V4575 | 53190 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN |
| 517733 | BENE159198 | CLM551294 | 2009-08-29 | 2009-08-29 | PRV53702 | 400 | PHY423019 | PHY332284 | NaN | 07041 | 5781 | 25000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN |
| 517734 | BENE159198 | CLM596444 | 2009-09-24 | 2009-09-24 | PRV53676 | 60 | PHY361063 | NaN | NaN | V570 | 78079 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN |
| 517735 | BENE159198 | CLM636992 | 2009-10-18 | 2009-10-18 | PRV53689 | 70 | PHY403198 | NaN | PHY419379 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN |
| 517736 | BENE159198 | CLM686139 | 2009-11-17 | 2009-11-18 | PRV53689 | 80 | PHY419379 | NaN | PHY419379 | 78900 | 78609 | 4280 | 71946 | 3310 | 75311 | 2724 | V103 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN |